First Stages of SEO : Benchmarking Current Indexing Status

11/28/2010 6:59:30 PM

The search engines have an enormous task—that of indexing the world’s online content; well, more or less. The reality is that they try hard to discover all of it, but they do not choose to include all of it in their index. This can happen for a variety of reasons, such as the page being inaccessible to the spider, being penalized, or not having enough link juice to merit inclusion.

When you launch a new site, or add new sections to an existing site, or are dealing with a very large site, not every page will necessarily make it into the index. To get a handle on this you will want to actively track the indexing level of your site. If your site is not fully indexed, it could be a sign of a problem (not enough links, poor site structure, etc.).

Getting basic indexation data from search engines is pretty easy. All three major search engines support the same basic syntax for that, which is site:yourdomain.com. Figure 1 shows a sample of the output from Bing.

Figure 1. Indexing data from Bing

Keeping a log of the level of indexation over time can help you understand how things are progressing. This can take the form of a simple spreadsheet.

Related to indexation is the crawl rate of the site. Google provides this data in Google Webmaster Central. Figure 2 shows a screen shot representative of the crawl rate charts that are available (another chart, not shown here, displays the average time spent downloading a page on your site).

Figure 2. Crawl data from Google Webmaster Tools

Short-term spikes are not a cause for concern, nor are periodic drops in levels of crawling. What is important is the general trend. In Figure 2 , the crawl rate seems to be drifting upward. This bodes well for both rankings and indexation.

For the other search engines, the crawl-related data can then be revealed using logfile analyzers , and then a similar time line can be created and monitored.